Overlapped block motion compensation based on blended predictors

ABSTRACT

Exemplary video processing methods and apparatuses for coding a current block. One implementation operates by receiving input video data associated with a current block in a current picture; determining one or more Motion Vectors (MVs) for generating an OBMC region; generating one or more converted MVs by changing said one or more MVs to one or more integer MVs or changing a MV component of said one or more MVs to an integer component; deriving the OBMC region by motion compensation using said one or more converted MVs; applying OBMC by blending an OBMC predictor in the OBMC region with an original predictor; and encoding or decoding the current block.

CROSS REFERENCE TO RELATED APPLICATION′

The present invention is a Continuation of U.S. patent application Ser.No. 17/047,443, filed on Oct. 14, 2020, which is a 371 National PhaseApplication, Serial No. PCT/CN2019/082675, filed on Apr. 15, 2019, whichclaims priority to U.S. Provisional Patent Application, Ser. No.62/657,995, filed on Apr. 16, 2018, entitled “Simplified overlappedblock motion compensation for subblock mode”, U.S. Provisional PatentApplication, Serial No. U.S. 62/750,279, filed on Oct. 25, 2018,entitled “Methods of Overlapped Blocks Motion Compensation with modifiedMV”, and U.S. Provisional Patent Application, Serial No. U.S.62/751,755, filed on Oct. 29, 2018, entitled “Method of OverlappedBlocks Motion Compensation with modified MV and MV constraints”. TheU.S. Provisional Patent Applications are hereby incorporated byreference in their entireties.

FIELD OF THE INVENTION

The present invention relates to video processing methods andapparatuses in video encoding and decoding systems. In particular, thepresent invention relates to overlapped sub-block motion compensation orsimplified overlapped block motion compensation.

BACKGROUND AND RELATED ART

The High-Efficiency Video Coding (HEVC) standard is the latest videocoding standard developed by the Joint Collaborative Team on VideoCoding (JCT-VC) group of video coding experts from ITU-T Study Group.The HEVC standard improves the video compression performance of itspreceding standard H.264/AVC to meet the demand for higher pictureresolutions, higher frame rates, and better video qualities. Duringdevelopment of the HEVC standard, several proposals associated withOverlapped Block Motion Compensation (OBMC) were made to improve codingefficiency.

OBMC The fundamental principle of OBMC finds a Linear Minimum MeanSquared Error (LMMSE) estimate of a pixel intensity value based onmotion compensated signals derived from its nearby block Motion Vectors(MVs). From estimation-theoretic perspective, these MVs are regarded asdifferent plausible hypotheses for its true motion, and to maximizecoding efficiency, the weights for the MVs are determined to minimizethe mean squared prediction error subject to the unit-gain constraint.OBMC was proposed to improve visual quality of reconstructed video whileproviding coding gain for boundaries pixels. In an example of applyingOBMC to a geometry partition, since two different MVs are used formotion compensation, pixels at the partition boundary typically havelarge discontinuities and result in visual artifacts such as blockartifacts. These discontinuities decrease the transform efficiency. Forexample, two regions created by a geometry partition are denoted asregion 1 and region 2, a pixel from region 1 is defined as a boundarypixel if any of its four connected neighboring pixels (i.e. left, top,right, and bottom pixels) belongs to region 2, and a pixel from region 2is defined as a boundary pixel if any of its four connected neighboringpixels belongs to region 1. FIG. 1 illustrates an example of boundarypixels between two regions of a block. Grey-shaded pixels 122 belong tothe boundary of a first region 12 at the top-left half of the block, andwhite-shaded pixels 142 belong to the boundary of a second region 14 atthe bottom-right half of the block. For each boundary pixel, motioncompensation is performed using a weighted sum of motion predictorsretrieved according to the MVs of the first region 12 and second region14. The weights are ¾ for the predictor retrieved using the MV of theregion containing the boundary pixel and ¼ for the predictor retrievedusing the MV of the other region.

OBMC is also used to smooth boundary pixels of symmetrical motionpartitions such as two 2N×N or N×2N Prediction Units (PUs) partitionedfrom a 2N×2N Coding Unit (CU). OBMC is applied to the horizontalboundary of two 2N×N PUs and the vertical boundary of two N×2N PUs.Pixels at the partition boundary may have large discontinuities aspartitions are reconstructed using different MVs. OBMC is applied toalleviate visual artifacts and improve transform/coding efficiency. FIG.2A demonstrates an example of applying OBMC to two 2N×N blocks and FIG.2B demonstrates an example of applying OBMC to two N×2N blocks. Greypixels in FIG. 2A or FIG. 2B are pixels belonging to Partition 0 andwhite pixels are pixels belonging to Partition 1. The overlapped regionin a luminance (luma) component is defined as two rows of pixels on eachside of the horizontal boundary and two columns of pixels on each sideof the vertical boundary. For pixels which are one row or one columnapart from the partition boundary, i.e. pixels labeled as A in FIG. 2Aand FIG. 2B, OBMC weighting factors are (¾, ¼) for the originalpredictor and OBMC predictor respectively. For pixels which are two rowsor two columns apart from the partition boundary, i.e., pixels labeledas B, OBMC weighting factors are (⅞, ⅛) for the original predictor andOBMC predictor respectively. For chrominance (chroma) components, theoverlapped region is defined as one row of pixel on each side of thehorizontal boundary and one column of pixel on each side of the verticalboundary, and the weighting factors are (¾, ¼) for the originalpredictor and OBMC predictor respectively.

Skip and Merge Skip and Merge modes were proposed and adopted in theHEVC standard to increase the coding efficiency of motion information byinheriting the motion information from spatially neighboring blocks or atemporally collocated block. To code a PU in Skip or Merge mode, insteadof signaling motion information, only an index representing a finalcandidate selected from a candidate set is signaled. The motioninformation reused by the PU coded in Skip or Merge mode includes amotion vector (MV), an inter prediction indicator, and a referencepicture index of the selected final candidate. It is noted that if theselected final candidate is a temporal motion candidate, the referencepicture index is always set to zero. Prediction residual is coded whenthe PU is coded in Merge mode, however, the Skip mode further skipssignaling of the prediction residual as the residual data of a PU codedin Skip mode is forced to be zero.

A Merge candidate set in the HEVC standard for a current PU 30 consistsof four spatial motion candidates associated with neighboring blocks ofthe current PU 30 and one temporal motion candidate associated with acollocated PU 32 of the current PU 30. As shown in FIG. 3, the firstMerge candidate is a left predictor A₁ 312, the second Merge candidateis a top predictor B₁ 314, the third Merge candidate is a right abovepredictor B₀ 313, and a fourth Merge candidate is a left below predictorA₀ 311. A left above predictor B₂ 315 is included in the Merge candidateset to replace an unavailable spatial predictor. A fifth Merge candidateis a temporal predictor of first available temporal predictors T_(BR)321 and T_(CTR) 322. The encoder selects one final candidate from thecandidate set for each PU coded in Skip or Merge mode based on motionvector compensation such as through a Rate-Distortion Optimization (RDO)decision, and an index representing the selected final candidate issignaled to the decoder. The decoder selects the same final candidatefrom the candidate set according to the index transmitted in the videobitstream. Since the derivations of Skip and Merge candidates aresimilar, the “Merge” mode referred hereafter may correspond to Mergemode as well as Skip mode for convenience.

Sub-block motion compensation is employed in many recently developedcoding tools such as subblock Temporal Motion Vector Prediction(sbTMVP), Spatial-Temporal Motion Vector Prediction (STMVP),Pattern-based Motion Vector Derivation (PMVD), and Affine MotionCompensation Prediction (MCP) to increase the accuracy of the predictionprocess. A CU or a PU in sub-block motion compensation is divided intomultiple sub-blocks, and these sub-blocks within the CU or PU may havedifferent reference pictures and different MVs. A high bandwidth istherefore demanded for sub-block motion compensation especially when MVsof each sub-block are very diverse. Some of the sub-block motioncompensation coding tools are described in the following paragraphs.

Sub-PU TMVP Subblock Temporal Motion Vector Prediction (Subblock TMVP,SbTMVP) is applied to the Merge mode by including at least one SbTMVPcandidate as a Merge candidate in the candidate set. SbTMVP is alsoreferred to as Alternative Temporal Motion Vector Prediction (ATMVP). Acurrent PU is partitioned into smaller sub-PUs, and correspondingtemporal collocated motion vectors of the sub-PUs are searched. Anexample of the SbTMVP technique is illustrated in FIG. 4, where acurrent PU 41 of size M×N is divided into (M/P)×(N/Q) sub-PUs, eachsub-PU is of size P×Q, where M is divisible by P and N is divisible byQ. The detail algorithm of the SbTMVP mode may be described in threesteps as follows.

In step 1, an initial motion vector is assigned for the current PU 41,denoted as vec_init. The initial motion vector is typically the firstavailable candidate among spatial neighboring blocks. For example, ListX is the first list for searching collocated information, and vec_initis set to List X MV of the first available spatial neighboring block,where X is 0 or 1. The value of X (0 or 1) depends on which list isbetter for inheriting motion information, for example, List 0 is thefirst list for searching when the Picture Order Count (POC) distancebetween the reference picture and current picture is closer than the POCdistance in List 1. List X assignment may be performed at slice level orpicture level. After obtaining the initial motion vector, a “collocatedpicture searching process” begins to find a main collocated picture,denoted as main_colpic, for all sub-PUs in the current PU. The referencepicture selected by the first available spatial neighboring block isfirst searched, after that, all reference pictures of the currentpicture are searched sequentially. For B-slices, after searching thereference picture selected by the first available spatial neighboringblock, the search starts from a first list (List 0 or List 1) referenceindex 0, then index 1, then index 2, until the last reference picture inthe first list, when the reference pictures in the first list are allsearched, the reference pictures in a second list are searched one afteranother. For P-slice, the reference picture selected by the firstavailable spatial neighboring block is first searched; followed by allreference pictures in the list starting from reference index 0, thenindex 1, then index 2, and so on. During the collocated picturesearching process, “availability checking” checks the collocated sub-PUaround the center position of the current PU pointed by vec_init_scaledis coded by an inter or intra mode for each searched picture.Vec_init_scaled is the MV with appropriated MV scaling from vec_init.Some embodiments of determining “around the center position” are acenter pixel (M/2, N/2) in a PU size MxN, a center pixel in a centersub-PU, or a mix of the center pixel or the center pixel in the centersub-PU depending on the shape of the current PU. The availabilitychecking result is true when the collocated sub-PU around the centerposition pointed by vec_init_scaled is coded by an inter mode. Thecurrent searched picture is recorded as the main collocated picturemain_colpic and the collocated picture searching process finishes whenthe availability checking result for the current searched picture istrue. The MV of the around center position is used and scaled for thecurrent block to derive a default MV if the availability checking resultis true. If the availability checking result is false, that is when thecollocated sub-PU around the center position pointed by vec_init_scaledis coded by an intra mode, it goes to search a next reference picture.MV scaling is needed during the collocated picture searching processwhen the reference picture of vec_init is not equal to the originalreference picture. The MV is scaled depending on temporal distancesbetween the current picture and the reference picture of vec_init andthe searched reference picture, respectively. After MV scaling, thescaled MV is denoted as vec_init_scaled.

In step 2, a collocated location in main_colpic is located for eachsub-PU. For example, corresponding location 421 and location 422 forsub-PU 411 and sub-PU 412 are first located in the temporal collocatedpicture 42 (main_colpic). The collocated location for a current sub-PU iis calculated in the following:

collocated location x=Sub-PU_i_x+vec_init_scaled_i_x(integerpart)+shift_x,

collocated location y=Sub-PU i_y+vec_init_scaled_i_y(integerpart)+shift_y,

where Sub-PU_i_x represents a horizontal left-top location of sub-PU iinside the current picture, Sub-PU_i_y represents a vertical left-toplocation of sub-PU i inside the current picture, vec_init_scaled_i_xrepresents a horizontal component of the scaled initial motion vectorfor sub-PU i (vec_init_scaled_i), vec_init_scaled_i_y represents avertical component of vec_init_scaled_i, and shift_x and shift_yrepresent a horizontal shift value and a vertical shift valuerespectively. To reduce the computational complexity, only integerlocations of Sub-PU_i_x and Sub-PU_i_y, and integer parts ofvec_init_scaled_i_x, and vec_init_scaled_i_y are used in thecalculation. In FIG. 4, the collocated location 425 is pointed byvec_init_sub_0 423 from location 421 for sub-PU 411 and the collocatedlocation 426 is pointed by vec_init_sub_1 424 from location 422 forsub-PU 412.

In step 3 of the SbTMVP mode, motion information (MI) for each sub-PU,denoted as SubPU_MI_i, is obtained from collocated_picture_i_L0 andcollocated_picture_i_L1 on collocated location x and collocated locationy. MI is defined as a set of {MV_x, MV_y, reference lists, referenceindex, and other merge-mode-sensitive information, such as a localillumination compensation flag}. Moreover, MV_x and MV_y may be scaledaccording to the temporal distance relation between a collocatedpicture, current picture, and reference picture of the collocated MV. IfMI is not available for some sub_PU, MI of a sub_PU around the centerposition will be used, or in another word, the default MV will be used.As shown in FIG. 4, subPU0_MV 427 obtained from the collocated location425 and subPU1_MV 428 obtained from the collocated location 426 are usedto derive predictors for sub-PU 411 and sub-PU 412 respectively. Eachsub-PU in the current PU 41 derives its own predictor according to theMI obtained on corresponding collocated location.

STMVP In JEM-3.0, a Spatial-Temporal Motion Vector Prediction (STMVP) isused to derive a new candidate to be included in a candidate set forSkip or Merge mode. Motion vectors of sub-blocks are derived recursivelyfollowing a raster scan order using temporal and spatial motion vectorpredictors. FIG. 5 illustrates an example of one CU with four sub-blocksand its neighboring blocks for deriving a STMVP candidate. The CU inFIG. 5 is 8×8 containing four 4×4 sub-blocks, A, B, C and D, andneighboring N×N blocks in the current picture are labeled as a, b, c,and d. The STMVP candidate derivation for sub-block A starts byidentifying its two spatial neighboring blocks. The first neighboringblock c is a N×N block above sub block A, and the second neighboringblock b is a N×N block to the left of the sub-block A. Other N×N blockabove sub-block A, from left to right, starting at block c, are checkedif block c is unavailable or block c is intra coded. Other N×N block tothe left of sub-block A, from top to bottom, starting at block b, arechecked if block b is unavailable or block b is intra coded. Motioninformation obtained from the two neighboring blocks for each list arescaled to a first reference picture for a given list. A temporal motionvector predictor (TMVP) of sub-block A is then derived by following thesame procedure of TMVP derivation as specified in the HEVC standard.Motion information of a collocated block at location D is fetched andscaled accordingly. Finally, all available motion vectors are averagedseparately for each reference list. The averaged motion vector isassigned as the motion vector of the current sub-block.

PMVD A Pattern-based MV Derivation (PMVD) method, also referred as FRUC(Frame Rate Up Conversion) or DMVR (Decoder-side MV Refinement),consists of bilateral matching for bi-prediction block and templatematching for a uni-prediction block. A FRUC_mrg_flag is signaled whenMerge or Skip flag is true, and if FRUC_mrg_flag is true, aFRUC_merge_mode is signaled to indicate whether the bilateral matchingBoth bilateral matching Merge mode and template matching Merge modeconsist of two-stage matching: the first stage is PU-level matching, andthe second stage is sub-PU-level matching. In the PU-level matching,multiple initial MVs in LIST_0 and LIST_1 are selected respectively.These MVs includes MVs from Merge candidates (i.e., conventional Mergecandidates such as these specified in the HEVC standard) and MVs fromtemporal derived MVPs. Two different starting MV sets are generated fortwo lists. For each MV in one list, a MV pair is generated by composingof this MV and the mirrored MV that is derived by scaling the MV to theother list. For each MV pair, two reference blocks are compensated byusing this MV pair. The Sum of Absolute Differences (SAD) of these twoblocks is calculated. The MV pair with the smallest SAD is selected asthe best MV pair. Then a diamond search is performed to refine the MVpair. The refinement precision is ⅛-pel. The refinement search range isrestricted within ±8 pixels. The final MV pair is the PU-level derivedMV pair.

The sub-PU-level searching in the second stage searches a best MV pairfor each sub-PU. The current PU is divided into sub-PUs, where the depthof sub-PU is signaled in Sequence Parameter Set (SPS) with a minimumsub-PU size of 4×4. Several starting MVs in List 0 and List 1 areselected for each sub-PU, which includes PU-level derived MV pair, zeroMV, HEVC collocated TMVP of the current sub-PU and bottom-right block,temporal derived MVP of the current sub-PU, and MVs of left and abovePUs or sub-PUs. By using the similar mechanism in PU-level searching,the best MV pair for each sub-PU is selected. Then the diamond search isperformed to refine the best MV pair. Motion compensation for eachsub-PU is then performed to generate a predictor for each sub-PU.

Affine MCP Affine Motion Compensation Prediction (Affine MCP) is atechnique developed for predicting various types of motion other thanthe translation motion. For example, rotation, zoom in, zoom out,perspective motions and other irregular motions. An exemplary simplifiedaffine transform MCP as shown in FIG. 6A is applied in JEM-3.0 toimprove the coding efficiency. An affine motion field of a current block61 is described by motion vectors 613 and 614 of two control points 611and 612. The Motion Vector Field (MVF) of a block is described by thefollowing equations:

$\left\{ {\begin{matrix}{v_{x} = {{\frac{\left( {v_{1x} - v_{0x}} \right)}{w}x} - {\frac{\left( {v_{1y} - v_{0y}} \right)}{w}y} + v_{0x}}} \\{v_{y} = {{\frac{\left( {v_{1y} - v_{0y}} \right)}{w}x} + {\frac{\left( {v_{1x} - v_{0x}} \right)}{w}y} + v_{0y}}}\end{matrix}\quad} \right.$

Where (v_(0x), v_(0y)) represents the motion vector 613 of the top-leftcorner control point 611, and (v_(1x), v_(1y)) represents the motionvector 614 of the top-right corner control point 612.

A block based affine transform prediction is applied instead of pixelbased affine transform prediction in order to further simplify theaffine motion compensation prediction. FIG. 6B illustrates partitioninga current block 62 into sub-blocks and affine MCP is applied to eachsub-block. As shown in FIG. 6B, a motion vector of a center sample ofeach 4×4 sub-block is calculated according to the above equation inwhich (v_(0x), v_(0y)) represents the motion vector 623 of the top-leftcorner control point 621, and (v_(1x), v_(1y)) represents the motionvector 624 of the top-right corner control point 622, and then roundedto 1/16 fraction accuracy. Motion compensation interpolation is appliedto generate a predictor for each sub-block according to the derivedmotion vector. After performing motion compensation prediction, the highaccuracy motion vector of each sub-block is rounded and stored with thesame accuracy as a normal motion vector.

A CU or a PU is divided into multiple sub-blocks when coded in one ofthe sub-block motion compensation coding tools, and these sub-blocks mayhave different reference pictures and different MVs. A high bandwidth isdemanded for sub-block motion compensation and high computationalcomplexity is required for applying OBMC to blocks coded in sub-blockmotion compensation. FIG. 7A illustrates an example of applying OBMC ona CU coded without sub-block motion compensation mode, and FIG. 7Billustrates an example of applying OBMC on a CU coded with a sub-blockmotion compensation mode. When OBMC applies to a current sub-block,beside current motion vectors, motion vectors of four connectedneighboring sub-blocks, if available and are not identical to thecurrent motion vector, are also used to derive a final predictor for thecurrent sub-block. Multiple predictors derived based on multiple motionvectors are combined to generate the final predictor. In FIG. 7A, afinal predictor for a current CU is calculated by using weighted sum ofa current motion compensated predictor C derived by a current MV, anOBMC predictor A′ derived from a MV of an above neighboring block A, andan OBMC predictor B′ derived from a MV of a left neighboring block B. InFIG. 7B, a final predictor for a current sub-block is calculated byusing weighted sum of a current motion compensated predictor C derivedby a current MV, an OBMC predictor A′ derived from a MV of an aboveneighboring block, an OBMC predictor B′ derived from a MV of a leftneighboring block, an OBMC predictor D′ derived from a MV of a rightsub-block D, and an OBMC predictor E′ derived from a MV of a bottomsub-block E.

An OBMC predictor based on a MV of a neighboring block/sub-block isdenoted as PN, with N indicating an index for above, below, left andright neighboring blocks/sub-blocks. An original predictor based on a MVof a current block/sub-block is denoted as PC. If PN is based on motioninformation of a neighboring block/sub-block that contains the samemotion information as the current block/sub-block, OBMC is not performedfrom this PN. Otherwise, every sample of PN is added to the same samplein PC. In JEM, four rows or four columns of PN are added to PC, andweighting factors for PN are {¼, ⅛, 1/16, 1/32} and weighting factorsfor PC are {¾, ⅞, 15/16, 31/32} respectively. In cases of applying OBMCin small MC blocks, when a height or width of coding block is equal to 4or when a CU is coded with sub-CU mode, only two rows or two columns ofPN are added to PC. The weighting factors are {¼, ⅛} and {¾, ⅞} for PNand PC respectively. For PN generated based on motion vectors of avertically (horizontally) neighboring sub-block, samples in the same row(column) of PN are added to PC with a same weighting factor. The OBMCprocess generating a final predictor by weighted sum is performed one byone sequentially which induces high computation complexity and datadependency.

OBMC may be switched on and off according to a CU level flag when a CUsize is less than or equal to 256 luma samples in JEM. For CUs with sizelarger than 256 luma samples or not coded with AMVP mode, OBMC isapplied by default. OBMC is performed for all Motion Compensation (MC)block boundaries except right and bottom boundaries of a CU when OBMC isenabled. OBMC is applied to both luma and chroma components. A MC blockcorresponds to a coding block if the CU is coded without sub-block MC,or a MC block corresponds to a sub-block in the CU if coded withsub-block MC.

At the encoder, when OBMC is applied to a CU, the impact is taken intoaccount during the motion estimation stage. OBMC predictors derivedusing motion information of top and left neighboring blocks are used tocompensate the top and left boundaries of an original predictor of thecurrent CU, and then normal motion estimation process is applied.

OBMC may be performed after the normal Motion Compensation (MC).Bidirectional Optical Flow (BDOF) is separately applied in both OBMC andnormal MC if OBMC is performed after normal MC. That is, MC results forthe overlapped region between two CUs or PUs are generated by the OBMCprocess not in the normal MC process. BDOF is applied to refine thesetwo MC results. Redundant OBMC and BDOF processes may be skipped whentwo neighboring MVs are the same. However, the required bandwidth and MCoperations for the overlapped region is increased compared tointegrating the OBMC process into the normal MC process. Sincefractional-pixel motion vectors are supported in newer coding standards,additional reference pixels around the reference block are retrievedaccording to the number of interpolation taps for interpolationcalculations. In one example, a current PU size is 16×8, an OBMC regionis 16×2, and an 8-tap interpolation filter is used in MC. If OBMC isperformed after normal MC, (16+7)×(8+7)+(16+7)×(2+7)=552 referencepixels per reference list are required for generating the current PU andrelated OBMC regions. If the OBMC operations are combined with normal MCinto one stage, only (16+7)×(8+2+7)=391 reference pixels per referencelist are required for the current PU and related OBMC.

There are two different implementation schemes for integrating OBMC innormal MC: pre-generation and on-the-fly. The first scheme ispre-generating OBMC regions and storing OBMC predictors of the OBMCregions in a local buffer for neighboring blocks when processing acurrent block by OBMC. The corresponding OBMC predictors are thereforeavailable in the local buffer at the time of processing the neighboringblock. FIG. 8A illustrates a reference block fetched for generating apredictor for a current block without generating OBMC regions. FIG. 8Billustrates a reference block fetched for generating a predictor for acurrent block as well as OBMC regions. The reference block is locatedaccording to the motion vector of the current block (MV1 in FIG. 8A andFIG. 8B). In this example, the size of the current block is W×H, 8-tapinterpolation filter is used for motion compensation, a width of a rightOBMC region is W′, and a height of a bottom OBMC region is H′. Anexample of W′ is four pixels and H′ is also four pixels, in this case,four additional columns are fetched to generate the right OBMC regionand four additional rows are fetched to generate the bottom OBMC region.The number of reference samples in the reference block as shown in FIG.8A needs to be fetched from memory is (3+W+4)×(3+H+4). The number ofreference samples in the reference block as shown in FIG. 8B fetchedfrom memory for generating the predictors for the current block and thetwo OBMC regions increases to (3+W+W′+4)×(3+H+H′+4). The right OBMCregion and the bottom OBMC region are stored in buffers for the OBMCprocess of right and bottom neighboring blocks. Additional line buffersacross Coding Tree Units (CTUs) are required to store the MC results ofthe bottom OBMC region. The second implementation scheme generates OBMCregions for a current block just before blending OBMC predictors and anoriginal predictor of the current block. For example, when applying OBMCon a current sub-block, OBMC predictors are not yet available in thelocal buffer, so an original predictor is derived according to the MV ofthe current sub-block, one or more OBMC predictors are also derivedaccording to MVs of one or more neighboring blocks, and then theoriginal predictor is blended with the one or more OBMC predictors.

BRIEF SUMMARY OF THE INVENTION

Exemplary methods of video processing in a video coding system performoverlapped sub-block motion compensation. An exemplary video processingmethod receives input video data associated with a current block in acurrent picture, partitions the current block into multiple overlappedsub-blocks according to an overlapped sub-block partition, anddetermines one or more sub-block MVs for each sub-block. Each sub-blockin the current block is overlapped with one or more other sub-blocks ina horizontal direction, a vertical direction, or both the horizontal andvertical directions according to the overlapped sub-block partition. Aselection of the overlapped sub-block partition is either predefined,explicitly signaled at a sequence level, picture level, tile grouplevel, or slice level in a video bitstream, or implicitly decidedaccording to motion information, a sub-block size, or a prediction modeof the current block. The exemplary video processing method derives aninitial predictor for each sub-block in the current block by motioncompensation using the one or more sub-block MVs. In some embodiments,the current block only contains overlapped regions, and in some otherembodiments, the current block contains both overlapped regions andnon-overlapped regions. A final predictor for each overlapped region isderived by blending the initial predictors of the overlapped region. Forthe non-overlapped regions, the initial predictors are used as there isonly one initial predictor associated with each non-overlapped region.The current block is encoded or decoded based on the final predictors ofthe overlapped regions and the initial predictors of the non-overlappedregion if available.

In some embodiments, the final predictor is derived by blending theinitial predictors of the overlapped region using weighted sum.Weighting factors for the initial predictors may be position dependentor may be depending on a number of overlapped sub-blocks.

Some exemplary video processing methods for processing blocks withOverlapped Block Motion Compensation (OBMC) in a video coding systemreceive input video data associated with a current block in a currentpicture, determine one or more MVs, for example one MV foruni-prediction or two MVs for bi-prediction, generate one or moreconverted MVs by changing the one or more MVs to one or more integer MVsor changing a MV component of the one or more MV to an integercomponent, and derive an OBMC region by motion compensation using theone or more converted MVs. The exemplary video processing methods applyOBMC by blending an OBMC predictor in the OBMC region with an originalpredictor, and encode or decoding the current block.

A first OBMC implementation scheme pre-generates at least one OBMCregion for at least one neighboring block when processing the currentblock, so the OBMC region is derived from the converted MV(s) generatedfrom one or more current MVs of the current block. In some embodiments,the converted MV(s) is used to pre-generate a right or bottom OBMCregion for a right or bottom neighboring block of the current block, andan OBMC predictor in the OBMC region is blended with the originalpredictor of the right or bottom neighboring block when processing theright or bottom neighboring block. The converted MV(s) may be used topre-generate both the right and bottom OBMC regions for the right andbottom neighboring blocks according to an embodiment. In anotherembodiment, for deriving the right OBMC region for the right neighboringblock, the converted MV(s) is generated by changing a horizontalcomponent of the MV(s) of the current block to an integer. Similarly,for deriving the bottom OBMC region for the bottom neighboring block,the converted MV(s) is generated by changing a vertical component of theMV(s) of the current block to an integer.

In some other embodiments, the OBMC region is derived using a convertedMV(s) or an original MV(s) depending on the prediction direction of thecurrent block, neighboring block, or both. For example, the OBMC regionis derived by motion compensation using the converted MV(s) if theprediction direction of the current block is bi-prediction, otherwisethe OBMC region is derived using the MV(s) without conversion if thecurrent block is uni-predicted. In another example, the OBMC region isderived by motion compensation using the converted MV(s) if either thecurrent block or neighboring block is bi-predicted, otherwise the OBMCregion is derived using the MV(s) without conversion if both the currentblock and neighboring block are uni-predicted. Pre-generation of theOBMC region may be adaptive according to one or more criterion in someembodiments. In one embodiment, the OBMC region is pre-generated onlywhen both horizontal and vertical components of the MV(s) of the currentblock are not integers, or in another embodiment, the OBMC region ispre-generated only when one of the horizontal and vertical components ofthe MV(s) of the current block is not an integer. In yet anotherembodiment, the OBMC region is pre-generated only when a predefinedcomponent of the MV(s) of the current block is not an integer, and thepredefined component depends on whether the OBMC region is for the rightor bottom neighboring block. In yet another embodiment, the OBMC regionis pre-generated by one of List 0 and List 1 MVs when the current blockis bi-predicted with one integer MV and one fractional MV, and the OBMCregion is pre-generated by the converted MV generated from thefractional MV. Horizontal and vertical components of the fractional MVare both fractional or only a predefined component of the fractional MVis fractional. A weighting factor for the OBMC predictor in thepre-generated OBMC region is reduced when blending the OBMC predictorwith the original predictor of the right or bottom neighboring block.For example, the weighting factor is reduced to half of an originalweighting factor, where the original weighting factor is used for anOBMC predictor in a pre-generated OBMC region when the current block isnot bi-predicted with one integer MV and one fractional MV.

A second OBMC implementation scheme generates both the OBMC predictor inthe OBMC region and an original predictor for the current block at thetime of processing the current block. The OBMC region is derived fromthe converted MV(s) generated from one or more neighboring MVs of aneighboring block, the OBMC region is derived for the current block, andthe OBMC region is blended with the original predictor of the currentblock. Some embodiments of the second OBMC implementation scheme derivethe OBMC region using the converted MV(s) if a prediction direction of aneighboring block is bi-prediction, or using the MV(s) withoutconversion if the prediction direction of the neighboring block isuni-prediction. Some other embodiments derive the OBMC region using theconverted MV(s) if the current block or neighboring block isbi-predicted, or using the MV(s) without conversion if both the currentblock and neighboring block are un-predicted. In yet another embodiment,the OBMC region is derived using the converted MV(s) only if the currentblock is bi-predicted with integer MVs and the neighboring block isbi-predicted.

Some examples of changing the MV(s) to integer MV(s) are truncating orrounding into integer MV(s). In some exemplary embodiments, the methodchecks a similarity of MVs of the current block and a neighboring block,and adaptively skips blending the OBMC predictor in the OBMC region withthe original predictor according to the similarity of the MVs. The MVsimilarity checking may be performed before or after generating theconverted MV(s).

In a variation of the video processing method, an embodiment sets amaximum number of OBMC blending lines in the OBMC region to 3 when thecurrent block is luminance (luma) block, and sets a maximum number ofOBMC blending lines to 1 or 2 when the current block is a chrominance(chroma) block. Another embodiment of the video processing method sets anumber of OBMC blending lines in the OBMC region for a luma component to3 if a fractional part of an absolute value of the MV(s) is larger than0.5 or is larger than or equal to 0.5, otherwise sets the number of OBMCblending lines in the OBMC region for the luma component to 4. Anembodiment of the video processing method sets a number of OBMC blendinglines in the OBMC region for chroma components to 1 if a fractional partof an absolute value of the MV(s) is larger than 0.5 or is larger thanor equal to 0.5, otherwise sets the number of OBMC blending lines in theOBMC region for the chroma components to 2. In yet another embodiment ofthe video processing method sets a number of OBMC blending lines in theOBMC region for chroma components to 1 if the number of OBMC blendinglines in the OBMC region for the luma component is reduced to 3,otherwise sets the number of OBMC blending lines in the OBMC region forthe chroma components to 2. In yet another embodiment, when the OBMCregion is derived by the converted MV(s) of a top neighboring block, anumber of OBMC blending lines in the OBMC region for a luma component is3 only if a fractional part of an absolute value of the MV(s) in avertical direction is larger than 0.5 or is larger than or equal to 0.5;otherwise, the number of OBMC blending lines for the luma component is4. When the OBMC region is derived by the converted MV(s) of a leftneighboring block, a number of OBMC blending lines in the OBMC regionfor a luma component is 3 only if a fractional part of an absolute valueof the MV(s) in a horizontal direction is larger than 0.5 or is largerthan or equal to 0.5, otherwise the number of OBMC blending lines forthe luma component is 4. Similarly, when the OBMC region is derived bythe converted MV(s) of a top neighboring block, a number of OBMCblending lines in the OBMC region for chroma components is 1 only if afractional part of an absolute value of the MV(s) in a verticaldirection is larger than 0.5 or is larger than or equal to 0.5, or thenumber of OBMC blending lines for the luma components is reduced to 3;otherwise, the number of OBMC blending lines for the chroma componentsis 2. When the OBMC region is derived by the converted MV(s) of a leftneighboring block, a number of OBMC blending lines in the OBMC regionfor chroma components is 1 only if a fractional part of an absolutevalue of the MV(s) in a horizontal direction is larger than 0.5 or islarger than or equal to 0.5, or the number of OBMC blending lines forthe luma components is reduced to 3, otherwise the number of OBMCblending lines for the chroma components is 2.

Some embodiments of applying OBMC adaptively determine a number of OBMCblending lines for blending an original predictor with an OBMC predictorof a current block. In one embodiment, the number of OBMC blending linesfor a left boundary of a current block is determined according to awidth of the current block, and/or the number of OBMC blending lines fora top boundary of the current block is determined according to a heightof the current block. The original predictor of the current block isderived by motion compensation using one or more current MVs of thecurrent block. The OBMC predictor of a left OBMC region having thenumber of OBMC blending lines for the left boundary is derived by motioncompensation using one or more MVs of a left neighboring block of thecurrent block. The OBMC predictor of a top OBMC region having the numberof OBMC blending lines for the top boundary is derived by motioncompensation using one or more MVs of a top neighboring block of thecurrent block. The video encoding or decoding system applies OBMC to thecurrent block by blending the OMBC predictor with the original predictorof the current block for the number of OBMC blending lines, and encodesor decodes the current block. For example, the width of a luma block iscompared with a predefined threshold to decide 2 OBMC blending lines or4 OBMC blending lines are used at the left boundary of the currentblock. The height of the luma block is compared with a predefinedthreshold to decide 2 OBMC blending lines or 4 OBMC blending lines areused at the top boundary of the current block. Less OBMC blending linesfor the left or top boundary are used for blocks with a width or lengthshorter than the predefined threshold. Similarly, the width of a chromablock may be used to decide the number of OBMC blending lines, forexample, 1 OBMC blending line is used at the left boundary if the widthof the chroma block is less than a predefined threshold, otherwise 2OBMC blending lines are used; and/or 1 OBMC blending line is used at thetop boundary if the width of the chroma block is less than a predefinedthreshold, otherwise 2 OBMC blending lines are used. The number of OBMCblending lines may be adaptively determined according to a length of aninterpolation filter used in motion compensation, for example, more OBMCblending lines are required when a longer interpolation filter isemployed.

The methods of adaptively determining the number of OBMC blending linesmay be enabled or disabled according to a flag, for example, 4 OBMCblending lines are used for the luma component and 2 OBMC blending linesare used for the chroma components if the flag indicates adaptive numberof OBMC blending lines is disabled. In an embodiment, the number of OBMCblending lines for the luma component is adaptively determined accordingto the width or height of the current block, and the number of OBMCblending lines for chroma components is determined according to thenumber of OBMC blending lines for the luma component.

Aspects of the disclosure further provide embodiments of apparatus ofprocessing video data in a video coding system. An embodiment of theapparatus comprises one or more electronic circuits configured forreceiving input data of a current block in a current picture,partitioning the current block into multiple overlapped sub-blocksaccording to an overlapped sub-block partition, determining one or moresub-block MVs for each sub-block; deriving an initial predictor for eachsub-block by motion compensation using the one or more sub-block MVs,deriving a final predictor for each overlapped region by blending theinitial predictors of the overlapped region, and encoding or decodingthe current block based on the final predictors. Another embodiment ofthe apparatus comprises one or more electronic circuits configured forreceiving input video data of a current block, determining one or moreMVs, generating one or more converted MVs by changing the one or moreMVs to one or more integer MVs or changing a MV component of the one ormore MVs to an integer component, deriving an OBMC region by motioncompensation using the one or more converted MVs, applying OBMC byblending an OBMC predictor in the OBMC region with an originalpredictor, and encoding or decoding the current block.

Aspects of the disclosure further provide a non-transitory computerreadable medium storing program instructions for causing a processingcircuit of an apparatus to perform a video processing method to encodeor decode a current block utilizing overlapped sub-blocks according tosome embodiments, or encode or decode a current block with OBMC and anOBMC region is derived using an integer MV according to some otherembodiments.

Other aspects and features of the invention will become apparent tothose with ordinary skill in the art upon review of the followingdescriptions of specific embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of this disclosure that are proposed as exampleswill be described in detail with reference to the following figures, andwherein:

FIG. 1 illustrates an example of overlapped motion compensation for ageometry partition.

FIGS. 2A and 2B illustrate examples of OBMC footprint for 2N×N block andN×2N block with different weightings for boundary pixels.

FIG. 3 illustrates positions of spatial and temporal MV candidates forconstructing a Merge candidate set.

FIG. 4 illustrates an example of determining sub-block motion vectorsfor sub-blocks in a current PU according to the SbTMVP technique.

FIG. 5 illustrates an example of determining a Merge candidate for a CUwith four sub-blocks according to the STMVP technique.

FIG. 6A illustrates an example of applying affine motion compensationprediction on a current block with two control points.

FIG. 6B illustrates an example of applying block based affine motioncompensation prediction with two control points.

FIG. 7A illustrates an example of applying OBMC to a block withoutsub-block motion compensation mode.

FIG. 7B illustrates an example of applying OBMC to a block withsub-block motion compensation mode.

FIG. 8A illustrates an example of a reference block fetched from thememory for generating a predictor for a current block.

FIG. 8B illustrates an example of a reference block fetched form thememory for generating a predictor for a current block and two OBMCpredictors for neighboring blocks.

FIG. 9A illustrates an exemplary non-overlapped sub-block partition.

FIG. 9B illustrates an exemplary overlapped sub-block partition withoverlapped regions located in a horizontal direction.

FIG. 9C illustrates an exemplary overlapped sub-block partition withoverlapped regions located in a vertical direction.

FIG. 9D illustrates an exemplary overlapped sub-block partition withoverlapped regions located in both horizontal and vertical directions.

FIG. 10 is a flowchart shows an exemplary embodiment of process acurrent block with overlapped sub-block motion compensation.

FIG. 11 illustrates an example of a reference block fetched from thememory for generating a predictor for a current block and two OBMCpredictors for neighboring blocks when a current MV is rounded up to aninteger MV for generating the OBMC predictors.

FIG. 12A is a flowchart shows an exemplary embodiment of processing acurrent block with OBMC using the first OBMC implementation scheme.

FIG. 12B is a flowchart shows an exemplary embodiment of processing acurrent block with OBMC using the second OBMC implementation scheme.

FIG. 13 illustrates an exemplary system block diagram for a videoencoding system incorporating the video processing method according toembodiments of the present invention.

FIG. 14 illustrates an exemplary system block diagram for a videodecoding system incorporating the video processing method according toembodiments of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

It will be readily understood that the components of the presentinvention, as generally described and illustrated in the figures herein,may be arranged and designed in a wide variety of differentconfigurations. Thus, the following more detailed description of theembodiments of the systems and methods of the present invention, asrepresented in the figures, is not intended to limit the scope of theinvention, as claimed, but is merely representative of selectedembodiments of the invention. In this disclosure, systems and methodsare described for motion compensation with an overlapped sub-blockpartition or Overlapped Block Motion Compensation (OBMC), and each or acombination of the methods may be implemented in a video encoder orvideo decoder. An exemplary video encoder and decoder implementing oneor a combination of the methods are illustrated in FIGS. 13 and 14respectively. Various embodiments in the disclosure reduce thecomputation complexity, especially for the interpolation filteringapplied in motion compensation, and reduce additional bandwidth requiredby OBMC in one or both the OBMC implementation schemes. Systems andmethods described herein are organized in sections as follows. Thesection “Overlapped Sub-block Partition” demonstrates exemplary methodsof overlapped sub-block motion compensation which achieves a similareffect of reducing artifact as the OBMC technique. The section“Directional OBMC” describes exemplary methods of applying OBMC on onlyone or more specific directions. The section “Short-tap InterpolationFilter for OBMC regions” describes exemplary methods of employingshort-tap interpolation filter when generating one or more OBMC regions.The section “Use Integer MVs for Generating OBMC Regions”, followed bythe sections “Conditionally Changing to Integer MV”, “ConditionallyPre-generating OBMC Regions”, “Conditionally Skip OBMC Blending”,“Conditionally Generating OBMC Regions”, and “Reduce Blending Lines forOBMC” describe various exemplary methods of converting a MV or a MVcomponent to an integer MV or an integer MV component when generating anOBMC region. The section “Adaptive number of OBMC blending lines”illustrates exemplary methods of generating OBMC regions with a numberof OBMC blending lines adaptively determined. The section “ExemplaryFlowchart” describes exemplary methods of generating an OBMC regionusing one or more converted MVs, where the converted MVs are integer MVsor MVs with an integer component. The sections “OBMC Interacts withBiCW” and “OBMC Interacts with BDOF” describe some examples ofimplementation of OBMC together with the Bi-prediction with CU weights(BiCW) technique and BDOF technique. The section “Video Encoder andDecoder Implementation” together with FIGS. 13 and 14 illustrate a videoencoding system and a video decoding system incorporating one or acombination of the described video processing methods.

Overlapped Sub-block Partition In order to reduce the computationalcomplexity of applying OBMC, affine prediction mode, or othersub-block-based prediction modes on non-overlapped sub-blocks,embodiments of the present invention apply overlapped sub-blocks motioncompensation instead of OBMC. FIG. 9A illustrates an example ofpartitioning a CU into 16 non-overlapped sub-blocks. FIG. 9B illustratesan example of partitioning a CU into 6 overlapped sub-blocks and 9Cillustrates another example of partitioning a CU into 6 overlappedsub-blocks. In the example of FIG. 9B, sub-block 0 is partiallyoverlapped with sub-block 1, sub-block 1 is partially overlapped withsub-block 2, sub-block 3 is partially overlapped with sub-block 4, andsub-block 4 is partially overlapped with sub-block 5. The overlappedsub-blocks in FIG. 9B only have overlapped regions located at the rightand/or left boundary of each sub-block. The video encoder or decoderderives an initial predictor for each sub-block of sub-blocks 0, 1, 2,3, 4, 5 according to one or more corresponding MVs of the sub-block, andthen the video encoder or decoder derives a final predictor for the CUby blending or combining the six initial predictors. For example, theinitial predictor of the left part of sub-block 1 is blended with theinitial predictor of the right part of sub-block 0, and the initialpredictor of the right part of sub-block 1 is blended with the initialpredictor of the left part of sub-block 2. The initial predictor of eachnon-overlapped region is the final predictor for the non-overlappedregion, for example, the initial predictor of the left part of sub-block0 is also the final predictor for the left part of sub-block 0.

In FIG. 9C, sub-block 0 is partially overlapped with sub-block 2,sub-block 2 is partially overlapped with sub-block 4, sub-block 1 ispartially overlapped with sub-block 3, and sub-block 3 is partiallyoverlapped with sub-block 5. The overlapped sub-blocks in FIG. 9C onlyhave overlapped regions located at the top and/or bottom boundary ofeach sub-block. An initial predictor is derived for each sub-block, andthe initial predictor of the top part of sub-block 2 is blended with theinitial predictor of the bottom part of sub-block 0, and the initialpredictor of the bottom part of sub-block 2 is blended with the initialpredictor of the top part of sub-block 4. Similarly, the initialpredictor of the top part of sub-block 3 is blended with the initialpredictor of the bottom part of sub-block 1, and the initial predictorof the bottom part of sub-block 3 is blended with the initial predictorof the top part of sub-block 5.

In some other embodiments, the overlapped regions are located at the topboundary, bottom boundary, left boundary, and right boundary, an exampleis shown in FIG. 9D. As shown in FIG. 9D, a CU is first partitioned into16 sub-blocks and one or more corresponding MVs are derived for eachsub-block. A first initial predictor is located according to one or twocorresponding MVs for each sub-block of the 16 sub-block partition. Thesame CU is also partitioned into 25 sub-blocks, and each of the 25sub-blocks is overlapped with one or more sub-blocks of the 16sub-blocks in both horizontal and vertical directions. New MVs arederived for the 25 sub-blocks for locating second initial predictors.Each pixel of a final predictor for an overlapped region is calculatedby blending or combining the pixel values of the two correspondinginitial predictors, where one predictor is a first initial predictorderived by the 16 sub-block partition and another predictor is a secondinitial predictor derived by the 25 sub-block partition.

The overlapped sub-block partition for splitting a block may beexplicitly signalled, implicitly decided, or predefined. For example, aselection of the overlapped sub-block partition is signalled at asequence level, picture level, tile group level or slice level of avideo bitstream. An example of implicitly decision determines theoverlapped sub-block partition according to one or a combination ofmotion information of the current block, the size of sub-blocks, orprediction mode of the current block. Initial predictors generated bythe overlapped sub-blocks in the current block may be combined orblended using weighted sum to generate a final predictor for the currentblock. The weights used to generate the final predictor in theoverlapped region can be position-dependent according to one embodiment,or each weight of the initial predictor depends on a number ofoverlapped sub-blocks according to another embodiment.

Exemplary Flowchart FIG. 10 illustrates an exemplary flowchart of avideo encoding or decoding system processing video data by overlappedsub-block motion compensation. The video encoding or decoding systemreceives input video data associated with a current block in a currentpicture in Step S1010. At the encoder side, the input video datacorresponds to pixel data to be encoded. At the decoder side, the inputdata corresponds to coded data or prediction residual to be decoded. InStep S1020, the video encoding or decoding system partitions the currentblock into overlapped sub-blocks according to an overlapped sub-blockpartition. Each sub-block in the current block is overlapped with one ormore other sub-blocks at the left and/or right boundary, at the topand/or bottom boundary, or at one or a combination of the left, right,top, and bottom boundaries. The overlapped sub-block partition may bepredefined, explicitly signaled in a sequence level, picture level, tilegroup level or slice level in a video bitstream, or implicitly decidedaccording to one or a combination of the motion information, sub-blocksize, and prediction mode of the current block. A sub-block MV isdetermined for each overlapped sub-block if the sub-block is predictedin uni-prediction, List 0 and List 1 sub-block MVs are determined foreach overlapped sub-block if the sub-block is predicted in bi-predictionin Step S1030. Each overlapped sub-block in the current block is motioncompensated by the sub-block MV(s) to derive an initial predictor from areference picture in Step S1040. The video encoding or decoding systemthen derives a final predictor for each overlapped region by blendingthe initial predictors of the overlapped region in Step S1050. There maybe non-overlapped regions as well as the overlapped regions in thecurrent block, or there are only overlapped regions in the current blockaccording to various overlapped sub-block partitions. In Step S1060, thevideo encoding or decoding system encodes or decodes the current blockbased on the final predictors of the overlapped regions. If there arealso non-overlapped regions in the current block, the video encoding ordecoding system encodes or decodes the current block based on the finalpredictors of the overlapped regions and the initial predictors of thenon-overlapped regions.

Directional OBMC For a CU coded in sub-block mode, conventional OBMC isapplied to each sub-block in four directions. Some embodiments of thedirectional OBMC reduce the number of directions in the OBMC sub-blockprocess. For example, only the OBMC predictor generated from a leftblock is used to combine with an original predictor of a currentsub-block. In another embodiment, only the OBMC predictor generated froman above block is used. The selected direction/directions for applyingthe OBMC process may be implicitly derived or explicitly signalled. Anexample of implicitly selecting the OBMC direction decides the directionbased on motion information of the current block and neighboring blocks.The motion information include one or a combination of motion vector,reference frame index, prediction direction, prediction mode, CU size,and sub-block size. In one embodiment, the direction of applying OBMC isselected based on the magnitude of Motion Vector Differences (MVD)between the current block and neighboring blocks. The direction with alarger MVD between the current block and the neighboring block isselected. In another embodiment, the direction with a smaller butnonzero MVD is selected. In one embodiment, the direction with a smalleraverage CU size of neighboring blocks is selected. The selection may beexplicitly signalled to the decoder at a sequence level, picture level,CTU level, CU level, or block level. The current block and theneighboring block here may be either a current block or a currentsub-block and a neighboring block or a neighboring sub-blockrespectively.

Short-tap Interpolation Filter for OBMC regions In order to reduce thecomputation complexity of the OBMC process, the interpolation filterlength of the interpolation filter used in motion compensation of theOBMC process may be reduced. The length of the interpolation filter usedin the OBMC process is shorter than the length of the interpolationfilter used in the normal MC process. Conventionally, an 8-tapinterpolation filter is used for luma samples and a 4-tap interpolationfilter is used for chroma samples in the normal MC process, anembodiment of the present invention uses a 4-tap interpolation filterfor performing OBMC on luma samples and a 2-tap interpolation filter forperforming OBMC on chroma samples. In another example, a 1-tapinterpolation filter is used for luma samples while a 2-tapinterpolation filter is used for chroma samples. In yet another example,a 1-tap interpolation filter is used in the OBMC process for both lumaand chroma samples. By reducing the number of interpolation filter taps,the computation complexity of OBMC may be reduced and extra memorybandwidth for OBMC regions is saved in the worst case scenario.

Use Integer MVs for Generating OBMC Regions In the followingdescription, a current block may be a current CU, PU, or sub-block, anda neighboring block may be a neighboring CU, PU, or sub-block. Someembodiments of the present invention only allow integer MVs in the OBMCprocess to further simplify the computation and reduce the memorybandwidth in motion compensation of the OBMC process. Equivalently, a1-tap interpolation filter is employed in the OBMC process. Exemplaryembodiments determine one or more MVs for generating an OBMC region,generating one or more converted MVs by changing the one or more MVs toone or more integer MVs. The one or more converted MVs of eachneighboring block are aligned to integer samples in the on-the-fly OBMCprocess, or the one or more converted MVs of the current block arealigned to integer samples in the pre-generation OBMC process to avoidfractional motion compensation calculation and accessing additionalreference samples in the OBMC process. The one or more converted MVs areused to derive an OBMC region by motion compensation. The OBMC predictorin the OBMC region is blended with an original predictor in the OBMCprocess. For example, the OBMC predictor is blended with an originalpredictor of a neighboring block in the pre-generation OBMC process,whereas the OBMC predictor is blended with an original predictor of thecurrent block in the on-the-fly OBMC process.

Some examples of changing a MV into an integer MV for generating OBMCregions include truncating or rounding the MV into an integer MV. Forexample, a MV is changed by discarding the fractional part of the MV,rounding the MV to a nearest integer, rounding to an integer MV when therounding offset is equal to 0.5 (e.g. offset=(1<<(shift_bit-1)), whereshift_bit is the rounding bits), or rounding to an integer MV when therounding offset is smaller than 0.5 (e.g. offset=(1<<(shift_bit−1)−1).If the distances between the original MV and two integer MVs are thesame, a closer to zero integer MV may be selected, or in anotherexample, a closer to infinite integer MV may be selected.

In some embodiments, only the MV of the luma component is converted toan integer MV, and in another embodiment, the MV of the chromacomponents is also rounded or truncated to an integer MV.

Under the first implementation scheme of pre-generating OBMC regions forneighboring blocks when performing motion compensation on a currentblock, only the MV(s) in the horizontal direction for deriving a rightOBMC region is changed to integer MV(s) according to one embodiment. Thehorizontal component of the MV is truncated or rounded to an integervalue before generating the right OBMC region for a right neighboringblock of the current block. That is, only the horizontal component inthe MV(s) of the current block is converted to an integer for deriving aright OBMC region. In another embodiment, only the MV(s) in the verticaldirection for deriving a bottom OBMC region is changed to integer MV(s).The vertical component of the MV is truncated or rounded to an integervalue before generating the bottom OMBC region for a bottom neighboringblock. Only the vertical component in the MV(s) of the current block isconverted to an integer for deriving a bottom OBMC region. In yetanother embodiment, the MV(s) in the horizontal direction for derivingthe right OBMC block is changed to integer MV(s) and the MV(s) in thevertical direction for deriving the bottom OBMC region is changed tointeger MV(s).

Conditionally Changing to Integer MV In some embodiments, when applyingthe first implementation scheme, that is pre-generation of OBMC regions,an OBMC region is derived by changing a current MV(s) or a MV componentof a current MV(s) into an integer MV(s) or integer if motioninformation of the current block and/or neighboring blocks satisfies acertain criterion. In the first implementation scheme, each of the OBMCregions generated by motion information of the current block is used toblend with one or more predictors of a neighboring block.

For example, the current MVs of a current block are changed to integerMVs before generating OBMC regions if the prediction direction of thecurrent block is bi-prediction. If the prediction direction of thecurrent block is uni-prediction, the current MV is used to generate OBMCregions for neighboring blocks without conversion. In another example,motion information of a neighboring block is also considered, so thecurrent MV(s) of a current block is converted to an integer MV(s) if theprediction direction of the current block or the neighboring block isbi-prediction. That is the current MV is not converted for generating anOBMC region for a neighboring block only if both the current block andneighboring block are uni-prediction.

When the second implementation scheme is applied, the MV(s) or a MVcomponent of a neighboring block is converted into an integer MV(s) oran integer MV component for generating an OBMC region for the currentblock if motion information of the neighboring block and/or the currentblock satisfies a certain criterion. In the second implementationscheme, one or more OBMC regions generated by motion information of oneor more neighboring blocks are used to process a current block byblending the one or more OBMC regions with the current predictor. Insome embodiments, the MV of the neighboring block is conditionallymodified to an integer precision depending on motion information of theneighboring block and/or current block. In one embodiment, the MVs ofthe neighboring block are changed to integer MVs if the predictiondirection of the neighboring block is bi-prediction. For example, whenthe current block and neighboring block are both predicted byuni-prediction, the MV from the neighboring block is not converted intoan integer MV; when the current block is uni-predicted and theneighboring block is bi-predicted, the MVs from the neighboring blockare converted to integer MVs; and when current block is bi-predictedwith integer MVs and the neighboring block is bi-predicted, the MVs fromthe neighboring block are converted to integer MVs. In anotherembodiment, the MV(s) of the neighboring block is converted to aninteger MV(s) for generating an OBMC region for the current block if theprediction direction of the current block or neighboring block isbi-prediction. In yet another embodiment, the MV(s) of a neighboringblock is converted to an integer MV(s) for generating an OBMC region fora current block only if the current block is bi-predicted with integerMVs and the neighboring block is also bi-predicted.

Conditionally Pre-generating OBMC Regions Some embodiments of the firstimplementation scheme always convert the MV(s) to an integer MV beforegenerating one or more OBMC regions, however, the one or more OBMCregions for one or more neighboring blocks are conditionallypre-generated. In one embodiment, right OBMC and bottom OBMC regions arepre-generated only when MV components of a current MV(s) are not integerin both horizontal and vertical directions, and if the MV components ofthe current MV(s) are not integers in both directions, the current MV(s)is changed to an integer MV(s) for generating OBMC regions. In thisembodiment, the MV components of the current block are first checkedbefore pre-generating a right OBMC region for a right neighboring block,if one of the MV components in horizontal and vertical directions is aninteger, the right OBMC region will not be pre-generated. Similarly, theMV components are checked before pre-generating a bottom OBMC region, ifone of the MV components in horizontal and vertical directions is aninteger, the bottom OBMC region will not be pre-generated. In anotherembodiment, the right and bottom OBMC regions are pre-generated when aMV component of the current MV in either horizontal or verticaldirection is not an integer. When the MV component in the horizontal orvertical direction is not integer, OBMC regions are pre-generated bychanging the MV component or the current MV(s) to an integer or aninteger MV(s). In this embodiment, the current MV(s) of the currentblock is first checked, and the right OBMC region is not pre-generatedif both the MV components in horizontal and vertical directions areintegers. Similarly, the bottom OBMC region is not generated if both theMV components in horizontal and vertical directions are integers.

Another embodiment of conditionally pre-generating an OBMC region checksthe current MV in a predefined direction for a right or bottom OBMCregion, and changes the current MV to an integer MV when the current MVin the predefined direction is not an integer. For example, the rightOBMC region is generated only when a horizontal component of the MV isnot an integer, and the horizontal MV component or MV components in alldirections are changed to integers when generating the right OBMCregion. In another example, the bottom OBMC region is generated onlywhen the vertical component of the MV is not an integer, and thevertical MV component or all MV components are changed to integers whengenerating the bottom OBMC region. In yet another example, each of theright and bottom OBMC regions is generated only when the MV component inthe horizontal and vertical direction is not an integer respectively.When generating the right or bottom OBMC region, the corresponding MVcomponent or the MV is changed to an integer or an integer MV forgenerating the right or bottom OBMC region.

In some embodiments, the current block is bi-predicted and only one ofthe MVs in List 0 and List 1 is an integer MV in both horizontal andvertical directions while the other MV is a fractional MV in at leastone direction, or only one of the MVs in List 0 and List 1 has aninteger MV component in a predefined direction while the other MV has afractional MV component in the predefined direction. The OBMC regionswill be pre-generated using the fractional MV by changing the fractionalMV into an integer MV, or the OBMC regions will be pre-generated usingthe fractional MV component in the predefined direction by changing thefractional MV component to an integer or changing all MV components tointegers. For example, the prediction direction for a current block isbi-prediction, and the MV in List 0 is an integer MV in both horizontaland vertical directions while the MV in List 1 is not an integer MV inboth the directions. The MV in List 1 is selected and changed to aninteger MV for generating OBMC regions. When one of these OBMC regionsis used by a neighboring block, it is blended with an original predictorderived according to motion information of the neighboring block, andthe weighting factor for the OBMC predictor of the OBMC region may bereduced, for example, the weighting factor may be decreased to half ofan original weighting factor. In another example, only one of thecurrent MVs in List 0 and List 1 has an integer MV component in thehorizontal direction, a right OBMC region is generated using the MV witha fractional MV component in the horizontal direction. The MV is changedto have an integer MV component in the horizontal direction or ischanged to have integer MV components in both directions beforegenerating the right OBMC region. When the OBMC predictor of this rightOBMC region is blended with another predictor, the weight factor for theOBMC predictor may be lower than the original weighting factor, forexample, the weighting factor is decreased to half of the originalweighting factor. For generating a bottom OBMC region according tobi-predicted motion information of a current block, if only one of theMVs in List 0 and List 1 has an integer MV component in the verticaldirection, the bottom OBMC region is generated using the MV that has afractional MV component in the vertical direction. The MV is changed toan integer MV component in the vertical direction or integer MVcomponents in both directions before generating the bottom OBMC region.The weighting factor for the OBMC predictor of this kind of bottom OBMCregion may be lowered, for example, decreased to half of the originalweighting factor for the normal bottom OBMC region.

Conditionally Skip OBMC Blending In some embodiments, OBMC blendingdepends on a similarity of MVs of a current block and neighboring block.For example, the MVs are checked by calculating a MV difference betweenthe MVs of the current block and neighboring block and comparing with apredefined threshold. OBMC blending between the current block andneighboring block is skipped if the MV difference is larger than thepredefined threshold. In one embodiment, the MV similarity checking isperformed before changing the MV for generating an OBMC region to aninteger MV. In another embodiment, the MV similarity checking isperformed after changing the MV for generating an OBMC region to aninteger MV. The MV for generating an OBMC region is the MV of thecurrent block in the first implementation scheme, and the MV forgenerating an OBMC region is the MV of the neighboring block in thesecond implementation scheme. In yet another embodiment, skip OBMCblending according to the MV similarity is disabled, and the MV(s) ischanged to an integer MV before generating an OBMC region.

Conditionally Generating OBMC Regions In some embodiments of the firstimplementation scheme, OBMC regions are generated only if a size, width,or height of a current block is larger than or equal to a predefinedthreshold. Some examples of the predefined threshold for the size is 16,32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, and 16384. For example,the OBMC regions are generated only if a current block has a size largerthan or equal to 64. In one embodiment, OBMC regions are generated onlyif the width of a current block is larger than or equal to a firstpredefined threshold, and the height of the current block is larger thanor equal to a second predefined threshold. Some examples of acombination of the first and second predefined thresholds are (4, 4),(4, 8), (4, 16), (4, 32), (4, 64), (4, 128), (8, 4), (8, 8), (8, 16),(8, 32), (8, 64), (8, 128), (16, 4), (16, 8), (16, 16), (16, 32), (16,64), (16, 128), (32, 4), (32, 8), (32, 16), (32, 32), (32, 64), (32,128), (64, 4), (64, 8), (64, 16), (64, 32), (64, 64), (64, 128), (128,4), (128, 8), (128, 16), (128, 32), (128, 64), and (128, 128). Forexample, OBMC regions are generated only if the width of a current blockis larger than or equal to 8 and the height is larger than or equal to16.

In some embodiments of the second implementation scheme, the constraintsfor deciding whether OBMC regions are generated are applied to one ormore neighboring blocks. For example, the OBMC regions are generatedonly if a width of a neighboring block is larger than or equal to 8 anda height of the neighboring block is larger than or equal to 16.

The conditionally OBMC region generation methods may only apply to lumacomponents, or may apply to both luma and chroma components.

Reduce Blending Lines for OBMC The number of blending lines for OBMC isthe number of pixels in the horizontal direction in a right OBMC regionor the number of pixels in the vertical direction in a bottom OBMCregion. The number of blending lines for OBMC is also defined as thenumber of rows of pixels on the horizontal boundary or the number ofcolumns of pixels on the vertical boundary processed by OBMC blending.In some embodiments of converting a fractional MV to an integer MV, thefractional MV is rounded up to the integer MV when the fractional partis larger than 0.5, or when the fractional part is larger than or equalto 0.5. FIG. 11 illustrates an example of fetching reference samples fora current block and two OBMC regions from a reference picture. The sizeof the current block is W×H, and the number of blending lines for OBMCis 4 in this example. The OBMC predictors (OBMC R 1120 and OBMC B 1130)for the right and bottom OBMC regions have a 1-pixel gap from a currentpredictor 1110 for the current block in the reference picture becausethe fractional current MV is rounded up to an integer MV. In this case,a larger reference block (W+8)×(H+8), rather than (W+7)×(H+7), isrequired for fetching the reference samples for the current block andthe two OBMC regions. In order to keep the maximum reference block sizefor fetching reference pixels for the current block and OBMC regionswithin a range, the OBMC process may only blend 3 lines of pixels at theblock boundary instead of 4 lines. An embodiment reduces a maximumnumber of OBMC blending lines so the worst case bandwidth will notincrease by rounding up a MV to an integer MV when pre-generating OBMCregions by the integer MV. For example, a maximum number of OBMCblending lines for the luma component is reduced from 4 to 3. A maximumnumber of OBMC blending lines for the chroma components is reduced from2 to 1 according to one embodiment, or remained as 2 according toanother embodiment. In one embodiment, the maximum numbers of blendinglines for the luma and chroma components are reduced from 4 to 3 andfrom 2 to 1 respectively.

Some other embodiments decide the number of OBMC blending lines for eachblock according to one or more predefined criterion. Exemplary systemsdetermine the number of OBMC blending lines according to the MV used forgenerating one or more OBMC regions. For example, an exemplary systemchecks if the fractional part of an absolute value of the MV in list 0or list 1 is larger than 0.5, and reduces the number of OBMC blendinglines for the luma component from 4 to 3 only if the fractional part islarger than 0.5. The number of blending lines remains as 4 if the MV isan integer MV or if the fractional part is less than or equal to 0.5. Inan embodiment considering both luma and chroma components, the number ofblending lines for the luma component is reduced from 4 to 3 only whenthe fractional part of an absolute value of the luma MV is larger than0.5, and the number of blended lines for the chroma components isreduced from 2 to 1 only when the fractional part of an absolute valueof the chroma MV is larger than 0.5. In an embodiment considering bothluma and chroma components, a number of blending lines for the lumacomponent is reduced from 4 to 3 only when the fractional part of anabsolute value of the luma MV is larger than 0.5, and a number ofblended lines for the chroma components is reduced from 2 to 1 if thenumber of blending lines for the luma component is reduced from 4 to 3.In another embodiment, the number of OBMC blending lines for the lumacomponent is reduced from 4 to 3 only when the fractional part of anabsolute value of the luma MV in list 0 or list 1 is larger than orequal to 0.5. In an embodiment, the number of OBMC blending lines forthe luma component is reduced from 4 to 3 only when the fractional partof an absolute value of the luma MV is larger than or equal to 0.5, andthe number of OBMC blending lines for the chroma components is reducedfrom 2 to 1 only when the fractional part of an absolute value of thechroma MV is larger than or equal to 0.5. In an embodiment, the numberof OBMC blending lines for the luma component is reduced from 4 to 3only when the fractional part of an absolute value of the luma MV islarger than or equal to 0.5, and the number of OBMC blending lines forthe chroma components is reduced from 2 to 1 if the number of OBMCblending lines for the luma component is reduced from 4 to 3.

Under the second implementation scheme, OBMC regions are generated justbefore the OBMC blending process of a current block. An embodiment ofreducing the number of OBMC blending lines first checks a luma or chromaMV of a neighboring block for deriving an OBMC region for a lumacomponent or chroma components respectively. For example, an exemplarysystem reduces the number of blending lines at a top block boundary forthe luma component from 4 to 3 only when the fractional part of anabsolute value of the luma MV in the vertical direction is larger than0.5, and the number of blending lines at the top block boundary for theluma component is 4 otherwise. Similarly, the number of OBMC blendinglines at a left block boundary is 3 for the luma component only when thefractional part of an absolute value of the luma MV in the horizontaldirection is larger than 0.5. In another example, the number of OBMCblending lines at the top or left block boundary for the luma componentis reduced from 4 to 3 only when the fractional part of an absolutevalue of the luma MV in the vertical or horizontal direction is largerthan 0.5, and the number of OBMC blending lines at the top or left blockboundary for the chroma components is reduced from 2 to 1 only when thefractional part of an absolute value of the chroma MV in the vertical orhorizontal direction is larger than 0.5. In another example, the numberof OBMC blending lines at the top or left block boundary for the lumacomponent is reduced from 4 to 3 only when the fractional part of anabsolute value of the luma MV in the vertical or horizontal direction islarger than 0.5, and the number of OBMC blending lines at the top orleft block boundary for the chroma components is reduced from 2 to 1 ifthe number of OBMC blending lines at the top or left block boundary forthe luma component is reduced from 4 to 3. Similarly, the aboveembodiments may be modified to determine whether to reduce the OBMCblending lines when the fractional part of the absolute value of theluma or chroma MV is larger than or equal to 0.5.

Adaptive Number of OBMC Blending Lines Some embodiments of the presentinvention adaptively determine the number of OBMC blending lines for acurrent block according to a width or height of the current blockdepending on the direction of OBMC blending. For example, the number ofOBMC blending lines for a left boundary of the current block depends onthe width of the current block, and the number of OBMC blending linesfor a top boundary of the current block depends on the height of thecurrent block. In an exemplary embodiment, the number of OBMC blendinglines for the luma component is reduced from 4 to 2 if the width orheight of the current block is less than a predefined threshold. Forexample, the number of OBMC blending lines for the luma component at theleft block boundary is reduced from 4 to 2 if the width of the currentblock is less than a first predefined threshold, and the number of OBMCblending lines is 4 if the width of the current block is larger than orequal to the first predefined threshold. The number of OBMC blendinglines for the luma component at the top block boundary is reduced from 4to 2 if the height of the current block is less than a second predefinedthreshold, and the number of OBMC blending lines is 4 if the height ofthe current block is larger than or equal to the second predefinedthreshold. The first and second predefined thresholds may be the same ordifferent. In the following examples, the first and second predefinedthresholds are both 8. In one example, if the width of a current blockis less than 8 and the height of the current block is larger than orequal to 8, the number of blending lines for the luma component at theleft boundary is reduced from 4 to 2, however, the number of blendinglines for the luma component at the top boundary remains 4. In anotherexample, if the width of a current block is less than 8 and the heightof the current block is also less than 8, both the numbers of blendinglines for the luma component at the left and top boundaries are reducedfrom 4 to 2. In another example, if the height of a current block isless than 8 and the width of the current block is larger than or equalto 8, the number of blending lines for the luma component at the topboundary is reduced from 4 to 2, and the number of blended lines for theluma component at the left boundary remains 4.

Some other embodiments of adaptively determining the number of OBMCblending lines determine the number of OBMC blending lines according toa length of an interpolation filter used in motion compensation. Theinterpolation filter length is also known as a number of taps in theinterpolation filter. For example, more OBMC blending lines are blendedwhen a longer interpolation filter is employed. In one specificembodiment, (L/2)−1 OBMC blending lines are used when the length of theinterpolation filter is L.

In one embodiment, adaptively determining the number of OBMC blendinglines may be enabled or disabled according to a flag, and the number ofblending lines is always equal to 4 for the luma component and 2 for thechroma components when the flag indicate adaptive number of OBMCblending lines is disabled.

In one embodiment, the number of OBMC blending lines for the chormacomponents is reduced in accordance with the luma component. Forexample, when the number of blending lines in the luma component isreduced from 4 to 2, the number of blending lines in the chromacomponents is reduced from 2 to 1; otherwise the numbers of blendinglines is 4 for the luma component and 2 for the chroma components.

The above OBMC blending lines reducing or determining methods may becombined with one of the methods of generating OBMC regions by theinteger MV(s). For example, the number of OBMC blending lines is reducedfrom 4 to 3 when the OBMC regions are generated by an integer MVconverted by rounding to a nearest integer. In some embodiments, all MVsare converted to integer MVs before generating the OBMC regions, in someother embodiments, a MV for generating an OBMC region is converted to aninteger MV when a current block or a neighboring block satisfies apredefined criterion. The predefined criterion may be related to one ora combination of a size of the current or neighboring block, a width ofthe current or neighboring block, a height of the current or neighboringblock, a prediction direction of the current or neighboring block, afractional part of the MV(s) of the current or neighboring block, andMV(s) of the current/neighboring block. An embodiment of conditionallychanging a MV to an integer MV for chroma components depends on whetherthe corresponding MV for the luma component is changed to an integer MV.For example, the MV for the chroma components is converted to an integerMV if the MV for the luma component is converted to an integer MV. Inanother embodiment, when the luma MV is changed to an integer MV, the MVfor the chroma components is derived from the integer MV of the lumacomponent. In yet another embodiment, when the luma MV is changed to aninteger MV, the MV for the chroma components is also changed to aninteger MV, and the number of OBMC blending lines for the luma componentis reduced from 4 to 3 and the number of OBMC blending lines for thechroma components is reduced from 2 to 1.

Exemplary Flowcharts FIGS. 12A and 12B illustrate two exemplaryflowcharts of a video encoding or decoding system for encoding ordecoding blocks with Overlapped Block Motion Compensation (OBMC). FIG.12A demonstrates an example of processing a current block according tothe first OBMC implementation scheme, and FIG. 12B demonstrates anexample of processing a current block according to the second OBMCimplementation scheme. In Step S1210 of FIG. 12A, the video encoding ordecoding system receives input data associated with a current block in acurrent picture. At the encoder side, the input data corresponds topixel data to be encoded. At the decoder side, the input datacorresponds to coded data or prediction residual to be decoded. In StepS1212, a current MV is determined for generating at least an OBMC regionfor one or more neighboring block, where the current MV may have a List0 MV and a List 1 MV when the current block is bi-predicted. The currentMV is changed to a converted MV by rounding or truncating to an integerMV or changing a MV component to an integer component in Step S1214. InStep S1216, the video encoding or decoding system derives an originalpredictor for the current block by motion compensation using the currentMV and derives one or more OBMC regions by motion compensation using theconverted MV for the one or more neighboring blocks. The one or moreOBMC regions are pre-generated for one or more neighboring blocks. Forexample, the converted MV is used to locate OBMC predictors for a rightOBMC region and a bottom OBMC region, and the right and bottom OBMCregions are later used to process a right and bottom neighboring block.The video encoding or decoding system retrieves one or more OBMCpredictors in one or more OBMC regions associated with the current blockfrom memory storage, and stores the one or more OBMC regions for the oneor more neighboring blocks in the memory storage. In Step S1218, thecurrent block is encoded or decoded by blending the original predictorfor the current block and the one or more OBMC predictors associatedwith the current block.

Step S1220 in FIG. 12B receives input video data of a current block in acurrent picture. A MV of a neighboring block is determined in Step S1222for generating an OBMC region for the current block. In this exampleshown in FIG. 12B, OBMC is only applied to one side of the currentblock, so only one OBMC region generated by one neighboring block isrequired; however, two or more OBMC regions may be generated when OBMCis applied to two or more sides of the current block. In Step S1224, thevideo encoding or decoding system generates a converted MV by changingthe MV of the neighboring block to an integer MV or by changing a MVcomponent of the MV to an integer component. An original predictor forthe current block is derived by motion compensation using a current MVof the current block, and an OBMC predictor in the OBMC region for thecurrent block is derived by motion compensation using the converted MVin Step S1226. The video encoding or decoding system encodes or decodesthe current block by blending the original predictor for the currentblock and the OBMC predictor for the current block in Step S1228.

OBMC Interacts with BiCW Bi-prediction with CU weights (BiCW), alsoknown as Generalized Bi-prediction (GBI), is a technique using a firstreference block selected from a first reference picture and a secondreference block selected from a second reference picture to code acurrent block. Each reference block is associated with a weight, and thecurrent block is predicted by a weighted sum of the two referenceblocks. In an embodiment of applying OBMC to a current block having aneighboring block coded in BiCW, an OBMC region is generated by equalweights regardless of the actual BiCW weights of the neighboring block.In another embodiment of applying OBMC to a current block having aneighboring block coded in BiCW, BiCW weights of the neighboring blockare stored and an OBMC region is generated according to the actual BiCWweights of the neighboring block.

OBMC Interacts with BDOF In general, a video coding system performsBidirectional Optical Flow (BDOF) during motion compensation. Normally,a motion vector of a current block identifies the location of areference block with respect to the current block in a referencepicture. When BDOF is applied to the current block, the video codingsystem modifies the motion vector on a per-pixel basis for the currentblock. That is, rather than retrieving each pixel of the reference blockas a block unit, according to BDOF, the video coding system determinesper-pixel modifications to the motion vector for the current block, andconstructs the reference block such that the reference block includesreference pixels identified by the motion vector and the per-pixelmodification for the corresponding pixel of the current block. In anembodiment of applying BDOF to generate an OBMC region, the video codingsystem retrieves reference pixels identified by the original MV and theper-pixel modification for the corresponding pixel of the OBMC region.In another embodiment, BDOF technique is disabled for generating OBMCregions.

Video Encoder and Decoder Implementations The foregoing proposed videoprocessing methods can be implemented in video encoders or decoders. Forexample, a proposed video processing method is implemented in apredictor derivation module of an encoder, and/or predictor derivationmodule of a decoder. In another example, a proposed video processingmethod is implemented in a motion compensation module of an encoder,and/or a motion compensation module of a decoder. Alternatively, any ofthe proposed methods is implemented as a circuit coupled to thepredictor derivation or motion compensation module of the encoder and/orthe predictor derivation module or motion compensation module of thedecoder, so as to provide the information needed by the predictorderivation module or the motion compensation module.

FIG. 13 illustrates an exemplary system block diagram for a VideoEncoder 1300 implementing various embodiments of the present invention.Intra Prediction 1310 provides intra predictors based on reconstructedvideo data of a current picture. Inter Prediction 1312 performs motionestimation (ME) and motion compensation (MC) to provide inter predictorsbased on video data from other picture or pictures. To encode a currentblock by an overlapped sub-block motion compensation coding toolaccording to some embodiments of the present invention, each overlappedregion in the current block is predicted by blending two or more initialpredictors derived by corresponding sub-block MVs of the overlappedregion. In some other embodiments, the Inter Prediction 1312 derives anOBMC predictor in an OBMC region by motion compensation using aconverted MV, where the converted MV is generated by changing a MV to aninteger MV or changing a MV component to an integer component. A finalpredictor for each block is generated by blending one or more OBMCpredictors with an original predictor in the Inter Prediction 1312.Either Intra Prediction 1310 or Inter Prediction 1312 supplies theselected predictor to Adder 1316 to form prediction errors, also calledprediction residual. The prediction residual of the current block arefurther processed by Transformation (T) 1318 followed by Quantization(Q) 1320. The transformed and quantized residual signal is then encodedby Entropy Encoder 1332 to form a video bitstream. The video bitstreamis then packed with side information. The transformed and quantizedresidual signal of the current block is processed by InverseQuantization (IQ) 1322 and Inverse Transformation (IT) 1324 to recoverthe prediction residual. As shown in FIG. 13, the prediction residual isrecovered by adding back to the selected predictor at Reconstruction(REC) 1326 to produce reconstructed video data. The reconstructed videodata may be stored in Reference Picture Buffer (Ref. Pict. Buffer) 1330and used for prediction of other pictures. The reconstructed video datarecovered from REC 1326 may be subject to various impairments due toencoding processing; consequently, In-loop Processing Filter 1328 isapplied to the reconstructed video data before storing in the ReferencePicture Buffer 1330 to further enhance picture quality.

A corresponding Video Decoder 1400 for decoding the video bitstreamgenerated from the Video Encoder 1300 of FIG. 13 is shown in FIG. 14.The video bitstream is the input to Video Decoder 1400 and is decoded byEntropy Decoder 1410 to parse and recover the transformed and quantizedresidual signal and other system information. The decoding process ofDecoder 1400 is similar to the reconstruction loop at Encoder 1300,except Decoder 1400 only requires motion compensation prediction inInter Prediction 1414. Each block is decoded by either Intra Prediction1412 or Inter Prediction 1414. Switch 1416 selects an intra predictorfrom Intra Prediction 1412 or an inter predictor from Inter Prediction1414 according to decoded mode information. Inter Prediction 1414performs overlapped sub-block motion compensation on a current block byblending initial predictors derived from overlapped sub-block MVsaccording to some exemplary embodiments. Inter Prediction 1414 generatesan OBMC region using one or more derived MVs for blending with anoriginal predictor according to some other exemplary embodiments. Theone or more derived MVs are generated by changing one or more MVs to oneor more integer MVs or changing a MV component of the one or more MVs toan integer component. The transformed and quantized residual signalassociated with each block is recovered by Inverse Quantization (IQ)1420 and Inverse Transformation (IT) 1422. The recovered residual signalis reconstructed by adding back the predictor in REC 1418 to producereconstructed video. The reconstructed video is further processed byIn-loop Processing Filter (Filter) 1424 to generate final decoded video.If the currently decoded picture is a reference picture for laterpictures in decoding order, the reconstructed video of the currentlydecoded picture is also stored in Ref. Pict. Buffer 1426.

Various components of Video Encoder 1300 and Video Decoder 1400 in FIG.13 and FIG. 14 may be implemented by hardware components, one or moreprocessors configured to execute program instructions stored in amemory, or a combination of hardware and processor. For example, aprocessor executes program instructions to control receiving of inputdata associated with a current picture. The processor is equipped with asingle or multiple processing cores. In some examples, the processorexecutes program instructions to perform functions in some components inEncoder 1300 and Decoder 1400, and the memory electrically coupled withthe processor is used to store the program instructions, informationcorresponding to the reconstructed images of blocks, and/or intermediatedata during the encoding or decoding process. The memory in someembodiments includes a non-transitory computer readable medium, such asa semiconductor or solid-state memory, a random access memory (RAM), aread-only memory (ROM), a hard disk, an optical disk, or other suitablestorage medium. The memory may also be a combination of two or more ofthe non-transitory computer readable mediums listed above. As shown inFIGS. 13 and 14, Encoder 1300 and Decoder 1400 may be implemented in thesame electronic device, so various functional components of Encoder 1300and Decoder 1400 may be shared or reused if implemented in the sameelectronic device.

Embodiments of the video processing method for encoding or decoding maybe implemented in a circuit integrated into a video compression chip orprogram codes integrated into video compression software to perform theprocessing described above. For examples, determining of a candidate setincluding an average candidate for coding a current block may berealized in program codes to be executed on a computer processor, aDigital Signal Processor (DSP), a microprocessor, or field programmablegate array (FPGA). These processors can be configured to performparticular tasks according to the invention, by executingmachine-readable software codes or firmware codes that defines theparticular methods embodied by the invention.

Reference throughout this specification to “an embodiment”, “someembodiments”, or similar language means that a particular feature,structure, or characteristic described in connection with theembodiments may be included in at least one embodiment of the presentinvention. Thus, appearances of the phrases “in an embodiment” or “insome embodiments” in various places throughout this specification arenot necessarily all referring to the same embodiment, these embodimentscan be implemented individually or in conjunction with one or more otherembodiments. Furthermore, the described features, structures, orcharacteristics may be combined in any suitable manner in one or moreembodiments. One skilled in the relevant art will recognize, however,that the invention can be practiced without one or more of the specificdetails, or with other methods, components, etc. In other instances,well-known structures, or operations are not shown or described indetail to avoid obscuring aspects of the invention.

The invention may be embodied in other specific forms without departingfrom its spirit or essential characteristics. The described examples areto be considered in all respects only as illustrative and notrestrictive. The scope of the invention is therefore, indicated by theappended claims rather than by the foregoing description. All changeswhich come within the meaning and range of equivalency of the claims areto be embraced within their scope.

1. A video processing method of sub-block motion compensation in a videocoding system, comprising: receiving input video data associated with acurrent block in a current picture; partitioning the current block intomultiple overlapped sub-blocks according to an overlapped sub-blockpartition, wherein each sub-block is overlapped with one or more othersub-blocks in a horizontal direction, a vertical direction, or both thehorizontal and vertical directions; determining one or more sub-blockMotion Vectors (MVs) associated with each sub-block of the overlappedsub-blocks in the current block; deriving an initial predictor for eachsub-block of the overlapped sub-blocks by motion compensation using saidone or more sub-block MVs; deriving a final predictor for eachoverlapped region by blending the initial predictors of the overlappedregion; and encoding or decoding the current block based on the finalpredictors of the overlapped regions.
 2. The method of claim 1, whereinthe overlapped sub-block partition is explicitly signaled at a sequencelevel, picture level, tile group level or slice level of a videobitstream.
 3. The method of claim 1, wherein the overlapped sub-blockpartition is implicitly decided according to one or a combination ofmotion information of the current block, a size of the sub-blocks, and aprediction mode of the current block.
 4. The method of claim 1, whereinthe overlapped sub-block partition is predefined and not signaled in avideo bitstream.
 5. The method of claim 1, wherein the final predictoris derived by blending the initial predictors of the overlapped regionusing weighted sum.
 6. The method of claim 5, wherein weighting factorsfor the initial predictors are position dependent or depending on anumber of overlapped sub-blocks.
 7. The method of claim 1, wherein thecurrent block contains the overlapped regions and non-overlappedregions, and the current block is encoded or decoded according to thefinal predictors of the overlapped regions and the initial predictors ofthe non-overlapped regions.
 8. An apparatus of processing video data ina video coding system, the apparatus comprising one or more electroniccircuits configured for: receiving input video data associated with acurrent block in a current picture; partitioning the current block intomultiple overlapped sub-blocks according to an overlapped sub-blockpartition, wherein each sub-block is overlapped with one or more othersub-blocks in a horizontal direction, a vertical direction, or bothhorizontal and vertical directions; determining one or more sub-blockMotion Vectors (MVs) associated with each sub-block of the overlappedsub-blocks in the current block; deriving an initial predictor for eachsub-block of the overlapped sub-blocks by motion compensation using saidone or more sub-block MVs; deriving a final predictor for eachoverlapped region by blending the initial predictors of the overlappedregion; and encoding or decoding the current block based on the finalpredictors of the overlapped regions.
 9. An apparatus of processingblocks with Overlapped Block Motion Compensation (OBMC) in a videocoding system, the apparatus comprising one or more electronic circuitsconfigured for: receiving input video data associated with a currentblock in a current picture; determining one or more Motion Vectors (MVs)for generating an OBMC region; generating one or more converted MVs bychanging said one or more MVs to one or more integer MVs or changing aMV component of said one or more MVs to an integer component; derivingthe OBMC region by motion compensation using said one or more convertedMVs; applying OBMC by blending an OBMC predictor in the OBMC region withan original predictor; and encoding or decoding the current block.